Loading the Data Warehouse Across Various Parallel Architectures

نویسنده

  • Vijay Raghavan
چکیده

Loading data is one of the most critical operations in any data warehouse, yet it is also the most neglected by the database vendors. Data must be loaded into a warehouse in a fixed batch window, typically overnight. During this period, we need to take maximum advantage of the machine resources to load data as efficiently as possible. A data warehouse can be on line for up to 20 hours of a day, which can leave only a window of 4 hours to complete the load. The Red Brick loader can validate, load and index at up to 12GB of data per hour on an SMP system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Elastic Performance for Etl+q Processing

Most data warehouse deployments are not prepared to scale automatically, although some applications have large or increasing requirements concerning data volume, processing times, data rates, freshness and need for fast responses. The solution is to use parallel architectures and mechanisms to speed-up data integration and to handle fresh data efficiently. Those parallel approaches should scale...

متن کامل

بهبود فرآیند استخراج، تبدیل و بارگذاری در پایگاه داده تحلیلی با کمک پردازش موازی

Abstract Data Warehouses are used to store data in a structure that facilitates data analysis. The process of Extracting, Transforming, and Loading (ETL) covers the process of retrieving required data from the source system and loading them to the data warehouse. Although the structure of source data (e.g. ER model) and DW (e.g. star schema) are usually specified, there is a clear lack of a ...

متن کامل

Distributed Warehouses: A Review on Design Methods and Recent Trends

The distributed data warehouse supports the decision makers by providing a single view of data even though that data is physically distributed across multiple data warehouses in multiple systems at different branches. This environment has changed the face of computing and offered quick and precise solutions for a variety of complex problems for different fields. This paper reviews distributed d...

متن کامل

Efficient ETL+Q for Automatic Scalability in Big or Small Data Scenarios

In this paper, we investigate the problem of providing scalability to data Extraction, Transformation, Load and Querying (ETL+Q) process of data warehouses. In general, data loading, transformation and integration are heavy tasks that are performed only periodically. Parallel architectures and mechanisms are able to optimize the ETL process by speedingup each part of the pipeline process as mor...

متن کامل

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

In today’s scenario, extraction–transformation– loading (eTl) tools have become important pieces of software responsible for integrating heterogeneous information from several sources. The task of carrying out the eTl process is potentially a complex, hard and time consuming. Organisations now –a-days are concerned about vast qualities of data. The data quality is concerned with technical issue...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996